InferSpark: Statistical Inference at Scale

نویسندگان

  • Zhuoyue Zhao
  • Eric Lo
  • Kenny Q. Zhu
  • Chris Liu
چکیده

The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the potential of automatically generating inference algorithms for the user defined models and answering various statistical queries about the model. It is a perfect time to unite these two great directions to produce a programmable big data analysis framework. We thus propose, InferSpark, a probabilistic programming framework on top of Apache Spark. Efficient statistical inference can be easily implemented on this framework and inference process can leverage the distributed main memory processing power of Spark. This framework makes statistical inference on big data possible and speed up the penetration of probabilistic programming into the data engineering domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ZaliQL: Causal Inference from Observational Data at Scale

Causal inference from observational data is a subject of active research and development in statistics and computer science. Many statistical software packages have been developed for this purpose. However, these toolkits do not scale to large datasets. We propose and demonstrate ZaliQL: a SQL-based framework for drawing causal inference from observational data. ZaliQL supports the state-of-the...

متن کامل

ZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data

Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suit...

متن کامل

Statistical Inference for the Lomax Distribution under Progressively Type-II Censoring with Binomial Removal

This paper considers parameter estimations in Lomax distribution under progressive type-II censoring with random removals, assuming that the number of units removed at each failure time has a binomial distribution. The maximum likelihood estimators (MLEs) are derived using the expectation-maximization (EM) algorithm. The Bayes estimates of the parameters are obtained using both the squared erro...

متن کامل

Using Probabilistic Views for Large-Scale Statistical Inference

Probabilistic databases extend statistical inference from limited, hand-crafted statistical models to an entire database. Data analysts can discover trends, test hypothesis, and run what-if scenarios by simply running SQL queries. The technical challenge in a probabilistic database is the query processor, which needs to perform a probabilistic inference for every row output by a SQL query: the ...

متن کامل

Review of the Applications of Exponential Family in Statistical Inference

‎In this paper‎, ‎after introducing exponential family and a history of work done by researchers in the field of statistics‎, ‎some applications of this family in statistical inference especially in estimation problem‎,‎statistical hypothesis testing and statistical information theory concepts will be discussed‎.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.02047  شماره 

صفحات  -

تاریخ انتشار 2015